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Abstract 



We show how to map the states of an ergodic Markov chain to 
Euchdean space so that the squared distance between states is the 
expected commuting time. We find a minimax characterization of 
^ . commuting times, and from this we get monotonicity of commuting 

CN I times with respect to equihbrium transition rates. All of these results 

are familiar in the case of time-reversible chains, where techniques of 
^^ ■ classical electrical theory apply. In presenting these results, we take 

r — I , the opportunity to develop Markov chain theory in a 'conformally 

^^ ' correct' way. 



1 Overview 

In an eye-opening paper, Chandra, Raghavan, Ruzzo, Smolensky, and Tiwari 
[1] revealed the central importance of expected commuting times for the 
theory of time-reversible Markov chains. Here we extend the discussion to 
general, non-time-reversible chains. 

We begin by showing how to embed the states in a Euclidean space so 
that the squared distance between states is the commuting time. In the 
time-reversible case, Leibon et al. have used Euclidean embeddings to great 
effect as a way to visualize a chain, and reveal natural clustering of states. 
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Our embedding theorem shows that non-time-reversible chains should be 
amenable to the same treatment. 

Looking beyond the Euclidean embedding, we find a natural minimax 
characterization of commuting times. From this we get the monotonicity 
law for commuting times: If all equilibrium interstate transition rates are in- 
creased, then all commuting times are diminished. For time-reversible chains, 
this monotonicity law is an ancient and powerful tool. It is questionable how 
useful it will prove to be in the general case. 

In presenting these results, we will be taking a 'conformally correct' ap- 
proach to Markov chains. Briefiy, a conformal change to a Markov chain 
changes its equilibrium measure, but not its equilibrium transition rates. 
The opportunity to develop this conformally correct approach is at least as 
important to us as the particular results we'll be discussing here. 



2 The problem 



The commuting time Tab between two states a, b of an ergodic Markov chain 
is the expected time, starting from a, to go to b and then back to a. Evidently 
Tab = Tba and 

Tac < Tab + Tbc- 

Thus it might seem natural to think of Tab as a measure of the distance 
between a and b. But in fact it is most natural to think of Tab as the squared 
distance between a and b. The reason is that, as we will see, there is a 
natural way to identify the states of the chain with points in a Euclidean 
space having quadratic form ||a;|p such that for any states a, b we have 

Tab = ||a-6||^. 

Now that we are interpreting Tab as a squared distance, the inequality Tac < 
Tab + Tbc tells us that 

||a-c|p < ||a-6|p + ||6-c|p. 

This means that all angles labc are acute (at least weakly: some might be 
right angles). 

Realizing commuting times as squared distances is straight-forward for 
time-reversible chains. Here's a sketch, meant only for orientation: We won't 



rely on any of this below. Time-reversible chains correspond exactly to re- 
sistor networks, with Tab corresponding to the effective resistance between a 
and b. This effective resistance is the energy of a unit current flow from a 
to b. The energy of a flow is its squared distance with respect to the energy 
norm on flows. If we associate to state i the unit current flow from i to 
some arbitrary reference vertex (the 'ground'), then the difference between 
the flows associated to a and b will be the unit current flow from a to b, 
having square norm Tab- 

The trick will be to extend this result to non-time-reversible chains. Now, 
it may in fact be the case that to any chain there corresponds a time-reversible 
chain having the same T, up to multiplication by a positive constant. This 
would immediately take care of the extension beyond the time- reversible case. 
It is easy enough to compute what the transition rates of this time-reversible 
chain would have to be, but we don't know that they are always positive. 
We leave this question for another day. 

Before proceeding, we should observe that the triangle inequality for 
squared lengths is not in itself a sufficient condition for realizability of a 
Euclidean simplex. It is sufficient for tetrahedra (four vertices in 3-space), 
but for five vertices we have the following counterexample. Take 
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This matrix is not realizable because the associated quadratic form with 
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is not positive definite: It has the eigenvalue |(22 — v^523) ~ —0.434597. 
Since we're going to see that commuting time matrices are always realizable, 
this means in particular that this matrix T cannot arise as the matrix of 
commuting times of a Markov chain. 



3 The short answer 

Below we will give the honest solution to this problem, developing in a thor- 
oughgoing way what we will call the 'conformally correct' approach to Markov 
chains. Here we just extract the answer to our embedding question, and 
present it in a way that should be immediately accessible to those famil- 
iar with the standard theory of Markov chains, as developed for example 
in Grinstead and Snell [6J. The only caveat is that we will be using tensor 
notation, i.e. writing some indices up rather than down. You can look at 
section |5] below for remarks about this, but if you prefer you can just view 
this as an idiosyncracy, as long as you bear in mind that Z^ represents a 
different array of numbers from Zjj. 

Consider a discrete-time Markov chain with transition probabilities 

Pi = Prob(next at j| start at i). 

Assume the chain is ergodic so there is a unique equilibrium measure w* with 



Y.^'p. 



■' = w^ 



and 



E^^ = i- 



Define 

and note that 



^ A^^' = Y. A'^ = 0. 

Now define 

Z/ = [IP - w^) + {P,^ - w^) + (P(2)/ -nj^) + ...^ 

where P^'^\ = J2k Pi'^Pk represents the matrix square of Pj'', and the elided 
terms involve higher matrix powers. Using conventional matrix notation if 
we define p(°°^ = w^ we can write 

z = {I- p(°°)) + (P - p(°°)) + (p(^) - P^°°)) + . . . 

_ (■/ _ p _j_ p(oo)^-l _ p(oo)_ 



(Note that Grinstead and Snell [H] use the alternate definition Z = {I — P + 
p(°o))-i^ which is less congenial but works just as well in this context.) 
Set 

Zij = —Zi . 
wi 

Zij acts like an inverse to A*-' in the sense that for any m* with X]j ^* = 0, we 
have 

Y,u^Z,u^^' = u' 

jk 

and 

jk 

Standard Markov chain theory tells us that the expected time Mat to hit 
state b starting from state a is 

Mab = Zhb — Zab- 

So for the commuting time we have 

Tab = Mab + Mba = Zaa — Zab ~ -^fea ~ Zbb- 

For a vector x = (a;j)j=i^...^„ define 

II II Z ^ ^ J 

Please note that this does not make A*-' the matrix of the quadratic form in 
the usual sense, because in general A*-' ^ A-'*. The matrix of the form in the 
usual sense is the symmetrized version |(A*'' + A-^*). 
Because 

J2 A*^' = Y. ^'^ = 

i 3 

we have the key identity 



\x\r = --^A*-''(xi-x 



Recalling the definition of A*-' gives 
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Thus the quadratic form ||x|p is weakly positive definite, but not strictly so, 
because it vanishes for constant vectors: 

||(c,...,c)|p = 0. 

It becomes strictly positive definite if we identify vectors differing by a con- 
stant vector: 

This Euclidean space (vectors mod constant vectors, with the pushed-down 
quadratic form) is where we will embed our chain. 
To get the embedding, map state a to the vector 

/(a) = (^ai)j=l,...,n- 

For the difference between the images of a and b we have 

(/(a) - f{b)), = Z,, - Zu = E(^a' - ^')Zk^, 

k 

with 5^ the Kronecker delta. We want to see that /(a) — f(h) has square 
norm Tab- 

From the generalized inverse relationship between Zij and A*-' and the 
fact that 

k 



we have 



Y^{z,, - ZuW = T.i^a' - ^')ZmA^^ = s,^ - V. 



ki 



So 



||/(a)-/(6)|p = J2iZ..-Z,,)A^^iZ,,-Z, 



bj) 



j 

Zaa ~ Zab — Z^a + Z^h 
Tab- 



There you have it. 



4 What just happened 

We want to explain the proof we have just given in more conceptual terms. 
Let K be a finite-dimensional real vector space, and V* the dual space, 
consisting of linear functionals : V — )■ R. For u & V*, x E V write 

{u,x)v = u{x) 

for the natural pairing between V and V*. Identify V with V** as usual: 

{x,u)v* = u{x) = {u,x)v 

To a map f : V ^ W we associate the adjoint map /* : W* — )■ V*, such that 

for u e W*, X eV 

{r{u),x)v = u{f{x)). 

A bilinear form on V arises from a linear map 

(f):V ^V 

via 

L^{x,y) = {(f){x),y)v. 

The adjoint map 

6" -.V ^V 



yields the transposed bilinear form 

L^*{x,y) = {(t)*{x),y)y = {x,(j){y))v* = (0(i/),a;)y = L^{y,x). 
If is invertible the inverse 

yields the form L^-i on V*: 

L^-i{u,v) = {(j)~\u),v)v* = {v,(j)~'^{u))v 
The forms L^* and L^-i are conjugate, because 



Going back the other way, 

L<p*{x,y) = L^-i{(f){x),(f){y)). 

From these two equations, we get two distinct ways to conjugate L^ to 
L^-i*. Plugging = (0"^)~^ into the first and putting {x,y) for {u,v), we 
get 

L<p{x,y) = L0-i*(0(x),0(t/)). 

Plugging = {(f)*Y into the second we get 

L^{x,y) = L^-..{(f{x),4>\y)). 

Now putting 0* for we see that in fact there were two ways to conjugate 

L^-i to L^*: 

L^*{x,y) = L^-i{(f){x),(f){y)) = L^-i{(j)*{x),(j)*{y)). 

Having two ways to conjugate L^ to L^-i* gives us an automorphism 

0~^ o 0* of Lf 

L^{x,y) = L^(0-i(0'^(x)), 0-1(0'^ (t/))). 

Along with 0~^ o 0* we also have the inverse automorphism 0~^* o 0: 

L^{x,y) = L^(0-i^(0(x)),0-^*(0(i/))). 

We could also consider powers other than —1 of our automorphism, but we 
don't need to, because the conjugacy between L^ and L^* is canonical (in 
the sense of being equivariant with respect to taking duals and inverses) 
up to this factor of two. The difference between them, as measured by the 
automorphism 0^^ o 0*, measures the antisymmetry of L^. It is destined to 
play an important role in our future. 

Looking now at the level of quadratic forms Q^{x) = L^[x, x), everything 
in sight is conjugate: 

Q^{x) = Q,j>*{x); 

Q^-^{u) = g^-i.H = Q4,{<j)'\u)) = g^(0-i*H). 

All this nonsense can be made much more concrete using matrices. Let 
V = R" and represent x & V, u & V* as column and row vectors respectively, 
so that the pairing is just multplying a row vector by a column vector: 

{u,x)y = ux. 



Denote transposition of matrices by -k. Write 

L4,ix,y) = x*Ay, 



so that 

Now 

so 



0(x) = x*A = {A*xy. 
(f)~\u) = {uA-y = A-i*M*, 

L^-i{u,v) = {v,(t)~^{u))v = vA''^*u* = uA^'^v*. 

Good! 

Now to see the two conjugacies of L^* with L^-i: 

A*A-^A = A"; 

AA-^A* = A\ 
These combine to give two automorphisms of L^: 

{A-'^A*yA{A-'^A'') = AA-^" AA-^ A" = A; 

(A'^M)M(A-^M) = AM^M-^M = A. 

Hmm. Why didn't we do it this way in the first place? 

So, here's what happened with our Markov chain. We started with the 
space V = R"/! with quadratic form L^{x,y) = J^ijXiA^^yj, embedded the 
states in V* = R" ± 1 with quadratic form L^~i{u,v) = J^ijW^ZijV^ , and 
proved that L^-i is positive definite by showing that it is conjugate to L^. 

5 Tensor notation for Markov chains 

As you will already have noticed, we are using tensor notation, rather than 
trying to work within the confines of matrix notation, as is usual in the theory 
of Markov chains. For our purposes, a tensor may be viewed as an array 
where some of the indices are written as superscripts rather than subscripts. 
Thus, for example, we write the transition rates for a Markov chain as Pj-', 
and the equilbrium measure as w*. 

Where the indices of a tensor are placed makes a difference: Thus Z/ 
represents a different array from Zij. We may 'raise' and 'lower' these indices 
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as is usual with tensors, though in this case the procedure is simpler than 
usual, because to raise or lower an index i we just multiply or divide by the 
entries of w*. Thus we get Zij from Z/ by lowering the index j: 

We get back to Z/ from Z^ by raising the index j: 

Z/ = w^Zij. 

We will still be able to use matrix notation to multiply matrices (two- 
index tensors) and vectors (one-index tensors). The beautiful thing is that 
when we do this, the indices take care of themselves, as long as the indices 
that get summed over when multiplying matrices are paired high with low. 
To show by example what this means, if we write C = AB, it will entail 
(among other things) that 

C/ = (AB)/ = Y.A'B,' = j:AkB'^ 

k k 

and 
Q, = (AB),^ = T.A'B,, =T.AkB', =T.Ak^'B,^ =T.AkB''^. 

k k k k 

Note. If you're familiiar with the Einstein summation convention, be 
aware that we don't use it here. It wouldn't work well in this context, because 
we want to write w'^Zij without automatically summing over i. Fortunately, 
for our purposes, using the notation of matrix multiplication turns out to be 
even more convenient than the summation convention. 



6 What it means to be conformally correct 

We have said that we want our approach to be 'conformally correct'. Before 
we go further, a word about what this means. (Skip this if you don't care.) 
Conformal equivalence of Markov chains is most natural for continuous 
time chains. In that context two chains with transition rates A/ and S/ are 
conformally equivalent if 

5/' = -A/ 
10 



where all a^ > 0. Generally we will also want the additional condition that 
J2i w^tti = 1 where w* is the equilibrium probability of being at i for the A 
chain. With this 'volume condition' the equilibrium probability of being at 
i for the B chain will be w'^ai and 

B'^ = w'aiB/ = w'a^-A/ = A'^ . 

Thus while the raw transition rates A/ are not conformal invariants, when we 
raise the index i we get a new array A"^^ = w^A/ whose entries are conformal 
invariants: They tell the rate at which transitions are made from i to j when 
the chain is in equilibrium. 

It is possible to talk about conformal equivalence of discrete time chains, 
but it is not as pleasant as for continuous-time chains. This is true so often 
in the theory of Markov chains! And yet, for simplicity, we want to talk 
about discrete-time chains. So our approach will be to do everything in such 
a way that the discussion would be conformally invariant when translated 
from discrete to continuous time. 

So that's what it means for chains to be conformally equivalent. As for 
'conformal correctness', we mean an approach that seeks to identify and 
emphasize quantities that are conformally invariant. And why should we do 
this? Because it will pay. 

7 Visualizing commuting times 

One way to determine the expected commuting time Tat between a and b is 
to run the chain for a long time T (beware of confusion!), paying attention 
to when the chain is at a or 6 and ignoring other states. If R is the number 
of runs of a's (which is within 1 of the number of runs of 6's), then 



- ab 



T/R. 



To keep track of R we imagine painting our Markovian particle green when it 
reaches a and red when it reaches b. Let Tab be the equilibrium rate at which 
red particles are being painted green. Ignoring end effects, over our long time 
interval T, R above is the number of times a red particle gets painted green, 
thus roughly Tvab, and it follows that 

J-ab — • 

Tab 

11 



This is an instance of the general principle from renewal theory that when 
events happen at rate r, the expected time between events is 1/r. 

Note. This painting business is very close to a model developed by King- 
man [8j and Kelly ^. (See exercise 1 in section 3.3 of Doyle and Snell |1].) 
However, I don't know that Kingman and Kelley ever made the connection 
to commuting times, and it is possible that their discussion concerned only 
time-reversible chains. Somebody should check this. 

It is high time to observe that if Tab is the commuting time for the time- 
reversed chain (according to the general convention that time-reversed quan- 
tities wear hats), we have 

■J- ab -'■ba -^ ab -^ fea • 

We claim to be able to see this from our way of approximating Tab by ob- 
serving the chain over a long time. If we reverse a record of the chain moving 
forward for a long time, we see roughly a record of the time-reversed chain 
starting in equlibrium. In fact if we started the original chain in equilibrium 
we're golden. If we started the chain not in equilibirum (e.g. by starting at 
a, as we might well be tempted to do), there will be problems toward the 
end of the time-reversed record, as the time-reversed chain gets drawn to end 
where the forward chain began. But this effect is negligible when T is large. 

8 The Laplacian and the cross-potential 

Consider a discrete-time Markov chain with transition probabilities 

Pj-' = Prob(next at j| start at i). 

Assume the chain is ergodic, so that there is a unique equilibrium measure 
w^ with 





i 




i 


Define the Laplacian 






A'^ = w'{Ii' -P,'). 
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For i 7^ j, —A*-' tells the equilibrium rate of transitions from i to j; A** 
tells the total rate of transitions to and from states other than i. The time- 
reversed Markov chain has Laplacian A*-' = A-'*. A time-reversible chain has 

We have 

So considered as a matrix, A*-^ is not invertible. However, it has a generalized 
inverse Zij with the property that for any measure of total mass 0, which is 
to say for any u^ with X^i w* = 0, we have 

jk 

and 

jk 

An equivalent way to write this is 

jk 

because if we think of A*-' as a matrix, its rows and columns both span the 
space of measures with total mass 0. 

A sensible choice for the generalized inverse Z^j is 

where 

Z/- = (//■ - w^) + iP,' - w^) + (P(2)/ -w^) + ..., 

where P^'^\ = Y.k Pi'^Pk represents the matrix square of Pj"', and the elided 
terms involve higher matrix powers. Define P^°°y = w^ , to suggest that the 
'infinitieth power' of P^-' has all rows equal to the vector w*. We can write 

z = {I - p(°°)) + (P - P^°°^) + (P^^^ - P^°°^) + . . . 

_ (•/ _ p _|_ p{oo)^-l _ p(oo)_ 

This naturally translates into the formula we've given for Z/, and from there, 
by 'lowering the index j', we get Z^j. 
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For this choice of Z we have the natural interpretation that Z^ is the 
expected excess number of visits to j for a chain starting at i compared to a 
chain starting in equihbrium. For the time-reversed chain we get 

^ij = Zji, 

and so in particular if the chain is time-reversible we have Zij = Zji. 

This is all very well, but we still do not want to prescribe this particular 
choice of Z because it is not conformally invariant: It depends on the equi- 
librium measure w*, and not just on the Laplacian 'matrix' A*-'. This makes 
it insufficiently canonical for us. 

What is canonical is the bilinear form 

when u and v are restricted to the subspace S of measures of total mass 0: 

S = {u':Y,u' = 0} 

i 

Fixing a, b, c, d and setting 
gives us the cross-potential 

Nabcd = B{5^ — 6f^\ 6^ — 6^) = Zac — Zad — Zbc + -^M- 

N satisfies 

^^bacd -^^ abdc -^^abcd- 

For the time-reversed process 

Nabcd = Ncdab- 

Clearly, knowing A^ is the same as knowing B, or A. If we know w as 
well as A^ we can recover our sensible-but-not-canonical Z: 



^ij = Yl ^ikjlW 



'w^ 



kl 



Different choices of w in this formula lead to different Z's, but they all de- 
termine the same bilinear form B. From Z and w we can recover P. 
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In general, it is useful to think of an ergodic Markov chain as specified by 
the cross-potential N, which determines its conformally invariant properties, 
together with the equilibrium measure w. Expressing formulas in these terms 
allows us to see the extent to which quantities are conformally invariant (like 
N, B, and A) or not (like w, Z, P). 

Complaint. A^ and w together don't quite determine the original tran- 
sition rates for a continuous-time Markov chain, or rather, they wouldn't 
do so if we had some way to distinguish between remaining at i and mov- 
ing from i to i. Such a distinction is not possible for discrete-time chains 
represented by matrices, but we could handle it in the continuous case by 
allowing for non-zero transition rates on the diagonal. Better yet, we could 
reformulate Markov chain theory in the context of queuing networks based 
on 1-complexes (graphs where loops and multiple edges are allowed). This 
would give us a way to distinguish different ways of stepping from i to j. A 
further step would be to allow a general distribution for the time it takes to 
make a transition for i to j. This would be very helpful when watching the 
chain only when it is in a subset of its states, as in the case above where we 
contemplated watching the chain only when it is at a or b. We didn't say 
just what we meant by this, because it doesn't conveniently fit into the usual 
formulation of Markov chain theory. 

9 Probabilistic and electrical interpretation 

We may interpret Nabcd probabilistically as the equilibrium concentration 
difference between c and d due to a unit flow of particles entering at a and 
leaving at b. Here's what this means. Introduce Markovian particles at a at 
a unit rate, and remove them when they reach b. Write the 'dynamic equi- 
librium' measure of particles at i as w*0i, so that (pi tells the concentration 
of particles relative to the 'static equilibrium' measure w*. Conservation of 
particles implies that 

We hasten to rewrite this in the conformally correct form 



15 



Since also 

j 
and since the Laplacian A kills only constants, if follows that 

and thus 



Zbc — Zad + Zm — N, 



abed- 



From this probabilistic interpretation of N we can see that Natab = Cab, 
the commuting time between a and b. Indeed, in the particle-painting sce- 
nario introduced earlier. Cab is the reciprocal of the rate at which red particles 
are turning green at a. Paying attention only to green particles, we see green 
particles appearing at a at rate 1/Cab, and disappearing at b. The equilib- 
rium concentration of green particles at i is the probability pi of hitting a 
before b for the time-reversed chain, and in particular pa = 1 and pb = 0, so 
the concentration difference between a and 6 is 1. Multiplying the green flow 
by Cab normalizes it to a unit flow with concentration difference Cab between 
a and b. So 

Cab = Nabab- 

If we embellish this probabilistic scenario by imagining that our parti- 
cles carry a positive charge, we may identify the net flow of particles with 
electrical current; the concentration of particles (relative to the equilibrium 
measure) with electrical potential; and differences of concentration with volt- 
age drop. With this terminology, Nabcd tells the voltage drop between c and 
d due to a unit current from a to b. Traditionally this way of talking is re- 
served for time-reversible Markov chains, which are precisely those for which 
we have the 'reciprocity law' Nabcd = N^dab- For such chains, if we build a 
resistor network where nodes i ^ j are joined by a resistor of conductance 
(i.e., reciprocal resistance) — A*-^, then Nabcd will indeed be the voltage drop 
between c and d due to a unit current from a to b. We propose to extend 
this way of talking to non-time-reversible chains. 

In electrical terms, the voltage drop Nabab between a and b due to a 
unit current between a and b is the effective resistance. This is the same as 
the reciprocal of the current that flows when a 1-volt battery is connected up 
between a and b — which is what we get in effect when we measure commuting 
times using green and red paint. So the commuting time Cab = Nabab is the 
same as the effective resistance between a and b. 
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The connection of commuting time to effective resistance, and the general 
recognition that commuting times play a key role in understanding Markov 
chains, is due to Chandra et al. [1]. 

Note. Now we are in a position to understand the significance of the 
name 'cross-potential'. This name is meant to indicate the connection of 
Nabcd to the cross-ratio of complex function theory. If we extend our notions 
about Markov chains to cover Brownian motion on the Riemann sphere, we 
get 

Nabcd = -7— (log|a-c| -log|a-(i| -log|6-c| +log|6-c?|) 

ZTT 



■i^°^ 



a — cb — d 



a — db — c 



1„, a — cb — d 

= -—mog . 

ZTT a — d b — c 

We don't have to specify a metric on the sphere here, because the Laplacian 
is a conformal invariant in two dimensions. Thinking of the sphere as being 
an electrical conductor with constant conductivity (say, 1 mho 'per square'), 
the electrical interpretation becomes exact. The advantage of having A^ to 
take four 'arguments' now becomes apparent, because Nabcb = 00. That's 
why engineers using look for cracks in nuclear reactor cooling pipes with a 
emph4-point probe. To get a sensible generalization of Cab we will need to 
do some kind of renormalization, which will introduce a dependence on the 
metric. We should not be sorry about this, because it brings curvature into 
the picture — and you know that can't be bad. 

10 Realization 

Now, finally, to realize commuting times as squared distances. From the 
bilinear form B we get the quadratic form 

Q{u) = ||m|P = B{u,u) = y^^u^ZjjU^ . 



Cab = Nabab = Qi^a - ^b) = I l^a - ^b 



|2 



So if we map i to 5^ then the commuting time Cab becomes the squared 
distance between the images in the Q-norm. 
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That is, if what we're calhng the Q-norm is indeed a norm. Is Q really 
positive definite? 

To understand better what is going on here, it is useful to look at the 
bilinear form 

L(0,^)=5:0,A%, 

where we think of and ip as being defined only modulo additive constants. 
If we think of 0j as the potential of the measure 

i 

then this is the same bilinear form as before, except that now instead of mea- 
sures of total mass it takes as its arguments the corresponding potentials, 
the first with respect to the original chain, and the second with respect to 
the time-reversed chain: 

i i i i 

This follows from the formula AZA = A above. 

Now to get the equivalent of Q in this context we restrict to the subspace 

\/ = {(0,^):E0.A^' = EA''V^.} 

i J 

and take as our quadratic form 

In the case of a time-reversible chain, V is just the diagonal (p = ip, and 

g(0A) = i?((0,0)) = L(0,0) = E0.A^'0. = lT.i-^'')i<t^^ - 0,)'. 

This is evidently positive-definite. Indeed, if we associate to (0, 0) the vector 



with f^j coordinates v^— A*^(0j — <Pj), i < j, then we will have embedded 
the normed space {V,R), and along with it our Markov chain, in Euclidean 
f 2)-space. 

Electrically, what we have done here is to account for the energy being 
dissipated in the network by adding up the energy dissipated by individual 



resistors. And there should be some kind of probabihstic interpretation as 
welL 

That's how it works for time-reversible chains, for which A"^^ = A-'*. 
However, the argument extends to the general case by what amounts to 
a trick. The key is the observation that for (0, ip) & V we have 

(But please note that in general L{(j),ip) ^ L{ip,(f))\) So 

Q((/.A) = i?((0,^)) = L(0,^) = L(0,0) = ^0,A*^0, = i^(_A^^)(0,-0,)'. 

ij ij 

So there is the positive-definiteness we need. 

Now, though, we don't see any natural way to interpret the terms of the 
sum electrically or probabilistically. (Which is not to say that there isn't 
one!) In putting in both slots of L we leave the subspace V, and thereby 
commit what appears to be an unnatural act. But it seems to have paid off. 

11 Minimax characterization of commuting 
times and hitting probabilities 

Fix states a ^ b, and let 

Sa,b = {4>\4>a = 1,0b = 0} 

Here we really should be thinking of as being defined only up to an additive 
constant, which means we should write 0a — 06 = 1, but we're going to be 
sloppy about this, because we want to focus attention on two distinguished 
elements of Sa^b which are naturally 1 and a and at b. These are 

0i = Prob(hit a before b starting at i going backward in time) 

and 

ipi = Prob(hit a before b starting at i going forward in time). 

We've met before: It's proportional to the equilibrium concentration of 
green particles in our painting scenario. -0 is the analogous quantity for the 
reversed chain. The pair (0, -0) belongs to our subset V, because 

{^Ay = {A^y = r,,{6:-6,^). 
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Here we once again are writing Tab = 777- for the equilibrium rate of commut- 

^ ab 

ing between a and b. Observe that any / we have 

mf)=Lif,i,) = rMa-fb). 
So whenever / is in Sa,b we have 

mf) = Lif,i,) = r,b, 

and in particular 

L(0,^) =rab- 

Theorem. 

1 
Tab = 77^ mm max L{(j),^). 

Tab " <t>+'^=2cx 

Here and below, a, 0, and ip are restricted to lie in Sa^, i.e. to take value 1 
at a and at h. 

Proof. Whatever a is, we may take = (and thus ip = 2a — (p) , and 
have 

L(0,^) = L(0,^) =r„fe 

as above. So 

min max L((h,ih) > Vnb- 

To prove the inequality in the other direction, and in the process identify 
where the minimax is achieved, take 

a = -(0 + ^). 

li (J) + ip = 2a then we can write 

= + / 

and 

^ = ^ - /, 

where /„ = /^ = 0. 

Now 

L{^J) = L{f,^)=rab{fa-fb) = 0, 

so 

L(0, ^) = L(0 + /, ^ - /) = L(0, V^) - L(/, /) = rab - Lif, /). 
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And even though we claim it is a travesty to put the same / into both slots 
of L, we still have 

L{fj)>0: 

That was the upshot of our embedding investigation. So 

L(0,V') < Tab, 

still assuming a = ^{(p + ip) and (j) + ip = 2a. Hence 

min max L((h,ib) > rnh- ■ 

In the time-reversible case, where A*-' = A-'*, this minimax can be reduced 
to a straight minimum. That's because in this case for any g, f we have 
Hf, 9) = L{9, /), and hence 

L{g + f,g-f) = Lig,g)-LifJ). 

So to maximize L{(j), ip) while fixing the sum + ?/^ = 2a; we take (j) = ip = a. 
Corollary. When A*-' is symmetric 

Tab = min Lid). 6). ■ 

This minimum principle for resistances was known already to 19th century 
physicists, specifically Thomson (a.k.a. Kelvin), Maxwell, and Rayleigh: For 
more about this, see Doyle and Snell |1]. 

Having a straight minimum is a lot better than having a minimax, because 
now we can plug in any with 0(a) = 1, 0(6) = and get an upper bound 
for Tab-, corresponding to a lower bound for Tab- This method is a staple of 
electrical theory — the part of electrical theory that doesn't extend to non- 
time-reversible chains because it depends on the relation L{f,g) = L{g, /). 

For time-reversible chains there are also complementary methods for find- 
ing lower bounds for Vab, and thus upper bounds for Tab- These emerge from 
the minimum principle through the mystery of convex duality. In practice, 
though, it is generally conceptually simpler to work instead with the mono- 
tonicity law described in the next section. This monotonicity law extends to 
all chains, but sadly, for all we can tell thus far, its usefulness appears to get 
left behind. 
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12 Monotonicity 

From the miniinax characterization of commuting times we immediately get 
the following: 

Monotonicity Law Commuting times decrease monotonically when equi- 
librium interstate transition increase: Using barred and unbarred quantities 
to refer to two different Markov chains, if A*-' < A*-' for all i ^ j then 
Tij < Tij for alH, j. ■ 

Actually it would be better to think of A and A here as referring to 
conformal classes of chains, rather than individual chains, because as we 
know A*-' and Tij are conformal invariants. 

This law holds for all chains, time-reversible or not. As we said above, for 
time-reversible chains this law can be used to get upper and lower bounds 
for commuting times, and hence for hitting probabilities: This is discussed 
in great detail by Doyle and Snell |1]. 

Sadly, even though the law extends to the non-time-reversible case, its 
usefulness does not extend, at least not in any obvious way. How can this 
be? There seem to be a number of reasons. 

First, for time- reversible chains, if we block transitions back and forth 
between states c, d, requiring the particle to remain where it is when it at- 
tempts to make such a transition, we get a new A dominated by the original 
A in the sense that A*-' < A*-' for i y^ j. Electrically speaking, blocking 
transitions between c and d amounts to cutting the wire between them. In 
the non-time-reversible case, this will change the equilibrium measure w^ and 
thereby destroy the relation A*-' < A*-' that we need for monotonicity. 

Second, for time-reversible chains, it is simple and natural to introduce 
intermediate states. Electrically speaking, introducing a state between c and 
d amounts to dividing the 'wire' connecting c and d into two pieces, if only in 
our mind's eye. By combining this with the putting or taking of wires, we can 
produce chains to bound Tab above or below as closely as we please. And we 
can do this in such a way that our approximating chains are easy to analyze. 
Here lies the third apparent shortcoming of the non-time-reversible case: A 
seeming paucity of chains whose commuting times are easy to compute. 

So, of what use is this monotonicity law in the non-time-reversible case? 
That remains to be seem. 
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13 The obstruction to time-reversibility 

Let Mij be the expected time to reach j starting from i. Coppersmith, Tetah, 
and Winkler showed that a Markov chain is time-reversible just if for all a, 6, c 



Mab + Mkc + M,a = Mac + M^b + Ml 



ha- 



And in this case the expected time to traverse a cycle of any length will be the 
same in either direction. Note that the MjjS themselves are not conformally 
invariant, these cycle sums are. For a cycle of length 2, the cycle sum is our 
best friend the commuting time. 
We always have 

Mab + Mbc + Mca = Mac + Mcb + Mba 

(look at a long record of the chain backwards), so an equivalent condition is 
that for all a,b,c 

Mab + Mbc + Mca = Mab + Mbc + Mca- 

This is true despite the fact that in general 

Mab ^ Mba- 

So, why is this true? It comes down to the fact that a conformal class 
of chains is reversible just if our bilinear for L(0, ■?/') on \/ = {a;*| J^i^i = 0} 
is symmetric. To any bilinear form J2ij u^ZijV^ on V their corresponds a 
natural cohomology class 

7 — 7 

which is to say, an antisymmetric matrix defined up to addition of a matrix of 
the form Bij = ai — aj. This class represents the obstruction to symmetrizing 
the matrix of the form within its a6-equivalence class. This class vanishes 
just if it integrates to around any cycle, and cycles of length 3 span the 
space of cycles. Indeed, they span it in a very redundant way. To verify 
reversibility, it would suffice to check any basis for the space of cycles, e.g. 
only cycles of length 3 involving the fixed state n (the 'ground'). 
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14 More to be said 

The next step would be discuss how to use the knee-jerk mapping to make a 
chain time-reversible without changing its commuting times. The knee-jerk 
method will produce the desired time-reversible chain whenever such a thing 
exists, but we still don't know if this is always the case. What we do know is 
that if it turns out that no suitable time-reversible chain exists, the knee-jerk 
method will delivers a time-reversible chain whose commuting times agree as 
well as possible with those of the original chain. (See Coppersmith et al. [2], 
Doyle |3] .) 

Then we should discuss uniformization of Markov chains, whereby we 
prescribe a canonical representative chain within each conformal class (or in 
other words, we prescribe a canonical w to accompany a given A^). This 
canonical chain extremizes the Kemeny constant K, which is the expected 
time to hit a point chosen according to the equilibrium distribution. (As 
Kemeny observed, K doesn't depend on where you start.) The extremal 
chain is characterized by constancy of the expected time Ki to hit i start- 
ing from equilibrium (the so-called 'preKemeny non-constant'). It's easy to 
write down the transition probabilities for this extremal chain. But, are they 
necessarily positive? 

Beyond this lies the extension of this whole business to diffusion on sur- 
faces, where we must renormalize hitting times because Brownian motion in 
dimension 2 never hits a given point. (Cf. Doyle and Steiner [5].) Now to 
uniformize we extremize not Kemeny's constant, but a variant with a cor- 
rection term involving the Gaussian curvature. Again, it is easy to write 
down the extremizing metric, or rather the extremizing area measure, which 
is not a priori positive everywhere. For spheres, all the round metrics tie 
for the extremum. For tori, the flat metrics win. For higher genus surfaces, 
the winners are not hyperbolic surfaces, nor should they be, because having 
constant curvature is a local condition that doesn't know thick from thin. 
The canonical measure is sensitive to thickness in a conformally correct way. 
But is it a positive measure? If it isn't, could it still be good for something? 
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